{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9.3 EM 算法另一种解释"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "EM算法的⽬标是找到具有潜在变量的模型的最⼤似然解，引入隐变量$Z$,所有模型参数集合为$\\theta$\n",
    "$$lnp(\\mathbf X|\\theta)=ln\\bigg\\{\\mathop\\sum_{\\mathbf Z}p(\\mathbf X,\\mathbf Z|\\mathbf{\\theta})\\bigg\\}$$\n",
    "目标是关于$\\theta$最大化似然函数$p(X|\\theta)$\n",
    "\n",
    "完整数据集的对数似然函数为$lnp(X,Z|\\theta)$，由于只有不完整的数据X，因此考虑隐变量的后验概率分布$p(Z|X,\\theta^{new})$，计算完整数据对数似然函数对于⼀般的参数值θ的期望:\n",
    "* 选择$\\theta^{old}$的初始值\n",
    "\n",
    "\n",
    "* E-step:使用当前参数值$\\theta^{old}$计算潜在变量的后验概率$p(Z|X,\\theta^{old})$，使用后验概率分布计算完整数据对数似然函数对于一般的参数值$\\theta$的期望$Q(\\theta,\\theta^{old})$\n",
    "\n",
    "\n",
    "* M-step:最大化期望，计算$\\theta^{new}$\n",
    "$$\\theta^{new}=\\mathop argmax_{\\theta}Q(\\theta,\\theta^{old})$$\n",
    "\n",
    "上式中我们操作的是联合概率分布$p(\\mathbf Z|\\mathbf X,\\theta^{old})$.且每个EM循环都会增⼤不完整数据的对数似然函数（除⾮已经达到局部极⼤值）\n",
    "\n",
    "其中:\n",
    "$$Q(\\theta,\\theta^{old})=\\mathop\\sum_{Z}p(Z|X,\\theta^{old})lnp(X,Z|\\theta)$$\n",
    "\n",
    "* 检查对数似然函数或者参数的收敛性，不满足收敛条件则$\\theta^{new}\\rightarrow \\theta^{old}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "用于高斯混合模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\\begin{align}&p(z)=\\prod_{k=1}^K\\pi_k^{z_k}\\\\\n",
    "&p(x|z)=\\prod_{k=1}^K\\mathcal N(x|\\mu_k,\\Sigma_k)^{z_k}\\\\\n",
    "&\\Rightarrow p(X,Z|\\mu,\\Sigma,\\pi)=\\prod_{n=1}^N\\prod_{k=1}^K\\pi_{k}^{z_{nk}}\\mathcal N(x_n|\\mu_k,\\Sigma_k)^{z_{nk}}\\end{align}$$\n",
    "$z_{nk}$表示$z_n$的第k个分量"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$lnp(X,Z|\\mu,\\Sigma.\\pi)=\\sum_{n=1}^N\\sum_{k=1}^Kz_{nk}\\bigg\\{ln\\pi_k+ln\\mathcal N(x_n|\\mu_k,\\Sigma_k)\\bigg\\}$$\n",
    "与不完整数据$\\mathbf X$对数似然函数对比，发现求和与对数运算交换了顺序\n",
    "$$lnp(X|\\pi,\\mu,\\Sigma)=\\sum_{n=1}^N ln\\bigg\\{\\sum_{k=1}^N\\pi_k\\mathcal N(x_n|\\mu_K,\\Sigma_k)\\bigg\\}$$\n",
    "由于$\\sum\\pi_k=1$,\n",
    "由Lagrange乘子法有:\n",
    "$$\\pi_k=\\frac{1}{N}\\sum_{n=1}^Nz_{nk}$$\n",
    "后验概率在n上进行分解有:\n",
    "$$\\mathbb E[z_{nk}]=\\frac{\\pi_k\\mathcal N(x_n|\\mu_k,\\Sigma_k)}{\\sum_{j=1}^K\\pi_j\\mathcal N(x_n|\\mu_j,\\Sigma_k)}\n",
    "=\\gamma(z_{nk})$$\n",
    "完整数据的对数期望值为:\n",
    "$$\\mathbb E_Z[lnp(X,Z|\\mu,\\Sigma,\\pi)]=\\sum_{n=1}^N\\sum_{k=1}^K\\gamma(z_{nk})\\bigg\\{ln \\pi_k+ln\\mathcal N(x_n|\\mu_k,\\Sigma_k)\\bigg\\}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 与k均值关系\n",
    "考虑一个高斯混合模型\n",
    "$$p(x|\\mu_k,\\Sigma_k)=\\frac{1}{(2\\pi \\epsilon)^{\\frac{D}{2}}}exp\\{-\\frac{1}{2\\epsilon}\\lVert x-\\mu_k \\lVert ^2\\}$$\n",
    "那么\n",
    "$$\\gamma(z_{nj})=\\frac{\\pi_kexp\\{-\\frac{1}{2\\epsilon}\\lVert x-\\mu_k \\lVert ^2\\}}{\\Sigma_j \\pi_j exp\\{-\\frac{1}{2\\epsilon}\\lVert x-\\mu_j \\lVert ^2\\}}$$\n",
    "\n",
    "\n",
    "令$\\epsilon \\rightarrow 0,$只有j项的$\\gamma(z_{nj})$趋近于1,其他项趋近于0,于是\n",
    "$$\\mathbb E_Z[lnp(X,Z|\\mu,\\Sigma,\\pi)]\\rightarrow-\\frac{1}{2}\\sum_{n=1}^N\\sum_{k=1}^Kr_{nk}\\Vert x_n-\\mu_k\\Vert^2+constant$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 贝叶斯线性回归的EM 算法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "回到3.5.2节的证据近似问题，目标是关于$\\alpha,\\beta$最大化证据函数$p(t|\\alpha,\\beta)$\n",
    "\n",
    "\n",
    "边缘似然函数:\n",
    "$$p(t|\\alpha,\\beta)=\\int p(t|w,\\beta)p(w|\\alpha)dw$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "由于参数w已经被积分出去，因此我们可以将其当做\n",
    "⼀个潜在变量，因此我们可以使⽤EM算法来优化边缘似然函数。在E步骤中，我们计算在给\n",
    "定当前的α和β的条件下， w的后验概率分布，然后使⽤这个找到完整数据对数似然函数的期\n",
    "望。在M步骤中，我们关于α和β最⼤化这个量。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "完整数据的对数似然就是:\n",
    "$$lnp(t,w|\\alpha,\\beta)=lnp(t|,\\alpha,\\beta)+lnp(w|\\alpha)$$\n",
    "其中:\n",
    "$$p(t|w,\\beta)=\\prod_{n=1}^N\\mathcal N(t_n|w^T\\phi(x_n),\\beta^{-1})\\\\\n",
    "p(w|\\alpha)=\\mathcal N(w|,\\alpha^{-1}I)$$\n",
    "\n",
    "关于w的后验概率分布取期望有:\n",
    "$$\\mathbb E[lnp(t,w|\\alpha,\\beta)]=\\frac{M}{2}ln(\\frac{\\alpha}{2\\pi}-\\frac{\\alpha}{2}\\mathbb E[w^Tw]+\\frac{N}{2}ln(\\frac{\\beta}{2\\pi}-\\frac{\\beta}{2}\\sum_{n=1}^N\\mathbb E[(t_n-W^t\\phi_n)^2]$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "关于$\\alpha$最大化:\n",
    "$$\\alpha=\\frac{M}{\\mathbb E[w^Tw]}=\\frac{M}{m^{T}_Nm_N+Tr(S_N)}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "与公式3.92$\\alpha=\\frac{\\gamma}{m^T_Nm_B}$不同，但会收敛到同样的结果"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "  ## 9.4一般形式的算法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "目标最大化似然函数\n",
    "$$p(\\mathbf X|\\theta)=\\mathop\\sum_{\\mathbf Z}p(\\mathbf X,\\mathbf Z|\\mathbf{\\theta})$$\n",
    "引入一个定义在隐变量上的分布$q(Z)$\n",
    "\n",
    "由$lnp(X,Z|\\theta)=lnp(Z|X,\\theta)+lnp(X|\\theta)$有:\n",
    "\n",
    "$$lnp(X|\\theta)=\\mathcal L(q,\\theta)+KL(q\\Vert p)$$\n",
    "\n",
    "其中:\n",
    "$$\\mathcal L(q,\\theta)=\\mathop\\sum_Zq(Z)ln\\bigg\\{\\frac{p(X,Z|\\theta)}{q(Z)}\\bigg\\}\\\\KL(q\\Vert p)=-\\mathop\\sum_Zq(Z)ln\\bigg\\{\\frac{p(Z|X,\\theta)}{q(Z)}\\bigg\\}$$\n",
    "使用概率乘积规则将$lnp(X,Z|\\theta)=lnp({Z|X,\\theta})+lnp({X|\\theta})$代入可以验证"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为$KL(q||p)\\ge0$,所以$\\mathcal L(q,\\theta)$为$lnp(X|\\theta)$的一个下界"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 若当前参数向量为$\\theta^{old}$\n",
    "\n",
    "\n",
    "* E-step:\n",
    "  下界$\\mathcal L(q,\\theta^{old})$关于$q(Z)$被最大化，保  持$\\theta^{old}$不变。\n",
    "\n",
    "  $lnp(X|\\theta^{old})$的值与$q(Z)$无关，$\\mathcal L(q,\\theta)$的最大值出现在$KL(q\\Vert p)=0$时，此时$q(Z)=p(Z|X,\\theta^{old})$\n",
    "  \n",
    "  此时$$\\begin{align}\\mathcal L(q,\\theta)&=\\mathop\\sum_Zp(Z|X,\\theta^{old})lnp(X,Z|\\theta)-\\mathop\\sum_Zp(Z|X,\\theta^{old})ln(Z|X,\\theta^{old})\\\\\n",
    "&=\\mathcal Q(\\theta,\\theta^{old})+constant\\end{align}$$\n",
    "\n",
    "\n",
    "* M-step:$q(Z)$保持固定，下界$\\mathcal L(q,\\theta)$关于$\\theta$最大化，得到$\\theta^{new}$\n",
    "\n",
    "  $q(Z)=p(Z|X,\\theta^{old})\\neq p(Z|X,\\theta^{new})$,对数似然函数的增量大与下   界的增量\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table><tr>\n",
    "<td><img src=\"../pic/09-12.png\" ></td>\n",
    "<td><img src=\"../pic/09-13.png\" ></td>\n",
    "</tr>\n",
    "<tr>\n",
    "    <td align=\"center\">E-step</td>\n",
    "    <td align=\"center\">M-step</td>\n",
    "</tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于独立同分布的数据集来说，利用加和规则和乘积规则\n",
    "$$\\begin{align}p(Z|X,\\theta)&=\\frac{p(X,Z|\\theta)}{\\sum_Zp(X,Z|\\theta)}\\\\\n",
    "&=\\frac{\\prod^N_{n=1}p(x_n,z_n|theta)}{\\sum_Z \\prod^N_{n=1}p(x_n,z_n|theta)}\\\\\n",
    "&=\\prod^N_{n=1}p(z_n|x_n,\\theta)\\end{align}\n",
    "$$\n",
    "在高斯混合模型中意味着每个分量对一个数据点的影响至于该点的值和混合分布的参数$\\theta$有关"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "EM算法也可用于最大化模型的后验概率$p(\\theta|X)$:\n",
    "$$lnp(\\theta|X)-ln(\\theta,X)-lnp(X)$$\n",
    "一样使用下界函数和KL散度的分解方式有:\n",
    "$$\\begin{align}lnp(\\theta|X)&=\\mathcal L(q,\\theta)+KL(q\\lVert p)+lnp(\\theta)-lnp(X)\\\\&\\ge \\mathcal L(q,\\theta)+lnp(\\theta)-lnp(X)\\end{align}$$"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
